anomalous data
Dissect Black Box: Interpreting for Rule-Based Explanations in Unsupervised Anomaly Detection
In high-stakes sectors such as network security, IoT security, accurately distinguishing between normal and anomalous data is critical due to the significant implications for operational success and safety in decision-making. The complexity is exacerbated by the presence of unlabeled data and the opaque nature of black-box anomaly detection models, which obscure the rationale behind their predictions. In this paper, we present a novel method to interpret the decision-making processes of these models, which are essential for detecting malicious activities without labeled attack data. We put forward the Segmentation Clustering Decision Tree (SCD-Tree), designed to dissect and understand the structure of normal data distributions.
Quantum Autoencoders for Anomaly Detection in Cybersecurity
Senthil, Rohan, Wong, Swee Liang
Anomaly detection in cybersecurity is a challenging task, where normal events far outnumber anomalous ones with new anomalies occurring frequently. Classical autoencoders have been used for anomaly detection, but struggles in data-limited settings which quantum counterparts can potentially overcome. In this work, we apply Quantum Autoencoders (QAEs) for anomaly detection in cybersecurity, specifically on the BPF-extended tracking honeypot (BETH) dataset. QAEs are evaluated across multiple encoding techniques, ansatz types, repetitions, and feature selection strategies. Our results demonstrate that an 8-feature QAE using Dense-Angle encoding with a RealAmplitude ansatz can outperform Classical Autoencoders (CAEs), even when trained on substantially fewer samples. The effects of quantum encoding and feature selection for developing quantum models are demonstrated and discussed. In a data-limited setting, the best performing QAE model has a F1 score of 0.87, better than that of CAE (0.77). These findings suggest that QAEs may offer practical advantages for anomaly detection in data-limited scenarios.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.92)
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Li, Shimiao, Qu, Guannan, Hooi, Bryan, Sekar, Vyas, Kar, Soummya, Pileggi, Larry
Power grids increasingly need real-time situational awareness under the ever-evolving cyberthreat landscape. Advances in snapshot-based system identification approaches have enabled accurately estimating states and topology from a snapshot of measurement data, under random bad data and topology errors. However, modern interactive, targeted false data can stay undetectable to these methods, and significantly compromise estimation accuracy. This work advances system identification that combines snapshot-based method with time-series model via Bayesian Integration, to advance cyber resiliency against both random and targeted false data. Using a distance-based time-series model, this work can leverage historical data of different distributions induced by changes in grid topology and other settings. The normal system behavior captured from historical data is integrated into system identification through a Bayesian treatment, to make solutions robust to targeted false data. We experiment on mixed random anomalies (bad data, topology error) and targeted false data injection attack (FDIA) to demonstrate our method's 1) cyber resilience: achieving over 70% reduction in estimation error under FDIA; 2) anomalous data identification: being able to alarm and locate anomalous data; 3) almost linear scalability: achieving comparable speed with the snapshot-based baseline, both taking <1min per time tick on the large 2,383-bus system using a laptop CPU.
Low-Communication Resilient Distributed Estimation Algorithm Based on Memory Mechanism
Li, Wei, Hu, Limei, Chen, Feng, Yao, Ye
In multi-task adversarial networks, the accurate estimation of unknown parameters in a distributed algorithm is hindered by attacked nodes or links. To tackle this challenge, this brief proposes a low-communication resilient distributed estimation algorithm. First, a node selection strategy based on reputation is introduced that allows nodes to communicate with more reliable subset of neighbors. Subsequently, to discern trustworthy intermediate estimates, the Weighted Support Vector Data Description (W-SVDD) model is employed to train the memory data. This trained model contributes to reinforce the resilience of the distributed estimation process against the impact of attacked nodes or links. Additionally, an event-triggered mechanism is introduced to minimize ineffective updates to the W-SVDD model, and a suitable threshold is derived based on assumptions. The convergence of the algorithm is analyzed. Finally, simulation results demonstrate that the proposed algorithm achieves superior performance with less communication cost compared to other algorithms.
A Mathematical Optimization Approach to Multisphere Support Vector Data Description
Blanco, Víctor, Espejo, Inmaculada, Páez, Raúl, Rodríguez-Chía, Antonio M.
We present a novel mathematical optimization framework for outlier detection in multimodal datasets, extending Support Vector Data Description approaches. We provide a primal formulation, in the shape of a Mixed Integer Second Order Cone model, that constructs Euclidean hyperspheres to identify anomalous observations. Building on this, we develop a dual model that enables the application of the kernel trick, thus allowing for the detection of outliers within complex, non-linear data structures. An extensive computational study demonstrates the effectiveness of our exact method, showing clear advantages over existing heuristic techniques in terms of accuracy and robustness.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Dissect Black Box: Interpreting for Rule-Based Explanations in Unsupervised Anomaly Detection
In high-stakes sectors such as network security, IoT security, accurately distinguishing between normal and anomalous data is critical due to the significant implications for operational success and safety in decision-making. The complexity is exacerbated by the presence of unlabeled data and the opaque nature of black-box anomaly detection models, which obscure the rationale behind their predictions. In this paper, we present a novel method to interpret the decision-making processes of these models, which are essential for detecting malicious activities without labeled attack data. We put forward the Segmentation Clustering Decision Tree (SCD-Tree), designed to dissect and understand the structure of normal data distributions. To further refine these segments, the Gaussian Boundary Delineation (GBD) algorithm is employed to define boundaries within each segmented distribution, effectively delineating normal from anomalous data points.
- Transportation > Air (0.64)
- Information Technology > Security & Privacy (0.61)
Post-Hoc Calibrated Anomaly Detection
Deep unsupervised anomaly detection has seen improvements in a supervised binary classification paradigm in which auxiliary external data is included in the training set as anomalous data in a process referred to as outlier exposure, which opens the possibility of exploring the efficacy of post-hoc calibration for anomaly detection and localization. Post-hoc Platt scaling and Beta calibration are found to improve results with gradient-based input perturbation, as well as post-hoc training with a strictly proper loss of a base model initially trained on an unsupervised loss. Post-hoc calibration is also found at times to be more effective using random synthesized spectral data as labeled anomalous data in the calibration set, suggesting that outlier exposure is superior only for initial training.
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Do, Nguyen, Nguyen, Truc, Hassanaly, Malik, Alharbi, Raed, Seo, Jung Taek, Thai, My T.
Despite a plethora of anomaly detection models developed over the years, their ability to generalize to unseen anomalies remains an issue, particularly in critical systems. This paper aims to address this challenge by introducing Swift Hydra, a new framework for training an anomaly detection method based on generative AI and reinforcement learning (RL). Through featuring an RL policy that operates on the latent variables of a generative model, the framework synthesizes novel and diverse anomaly samples that are capable of bypassing a detection model. These generated synthetic samples are, in turn, used to augment the detection model, further improving its ability to handle challenging anomalies. Swift Hydra also incorporates Mamba models structured as a Mixture of Experts (MoE) to enable scalable adaptation of the number of Mamba experts based on data complexity, effectively capturing diverse feature distributions without increasing the model's inference time. Empirical evaluations on ADBench benchmark demonstrate that Swift Hydra outperforms other state-of-the-art anomaly detection models while maintaining a relatively short inference time. From these results, our research highlights a new and auspicious paradigm of integrating RL and generative AI for advancing anomaly detection.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > South Korea (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- Energy (1.00)
- Health & Medicine > Therapeutic Area (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Security & Privacy (0.45)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)
A Novel Spatiotemporal Correlation Anomaly Detection Method Based on Time-Frequency-Domain Feature Fusion and a Dynamic Graph Neural Network in Wireless Sensor Network
Ye, Miao, Jiang, Zhibang, Xue, Xingsi, Li, Xingwang, Wen, Peng, Wang, Yong
Attention-based transformers have played an important role in wireless sensor network (WSN) timing anomaly detection due to their ability to capture long-term dependencies. However, there are several issues that must be addressed, such as the fact that their ability to capture long-term dependencies is not completely reliable, their computational complexity levels are high, and the spatiotemporal features of WSN timing data are not sufficiently extracted for detecting the correlation anomalies of multinode WSN timing data. To address these limitations, this paper proposes a WSN anomaly detection method that integrates frequency-domain features with dynamic graph neural networks (GNN) under a designed self-encoder reconstruction framework. First, the discrete wavelet transform effectively decomposes trend and seasonal components of time series to solve the poor long-term reliability of transformers. Second, a frequency-domain attention mechanism is designed to make full use of the difference between the amplitude distributions of normal data and anomalous data in this domain. Finally, a multimodal fusion-based dynamic graph convolutional network (MFDGCN) is designed by combining an attention mechanism and a graph convolutional network (GCN) to adaptively extract spatial correlation features. A series of experiments conducted on public datasets and their results demonstrate that the anomaly detection method designed in this paper exhibits superior precision and recall than the existing methods do, with an F1 score of 93.5%, representing an improvement of 2.9% over that of the existing models.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
Incremental Gaussian Mixture Clustering for Data Streams
Bhanderi, Aniket, Bhatnagar, Raj
The problem of analyzing data streams of very large volumes is important and is very desirable for many application domains. In this paper we present and demonstrate effective working of an algorithm to find clusters and anomalous data points in a streaming datasets. Entropy minimization is used as a criterion for defining and updating clusters formed from a streaming dataset. As the clusters are formed we also identify anomalous datapoints that show up far away from all known clusters. With a number of 2-D datasets we demonstrate the effectiveness of discovering the clusters and also identifying anomalous data points.
- North America > United States > Maryland > Montgomery County > Bethesda (0.04)
- Asia > China > Beijing > Beijing (0.04)